Learning to Solve Constraint Problems

نویسندگان

  • Susan L. Epstein
  • Smiljana Petrovic
چکیده

This paper explains why learning to solve constraint problems is so difficult, and describes a set of methods that has been effective on a broad variety of problem classes. The primary focus is on learning an effective search algorithm as a weighted subset of ordering heuristics. Experiments show the impact of several novel techniques on a variety of problems. When a planning problem is cast as a constraint satisfaction problem (CSP), it can use the representational expressiveness and inference power inherent in constraint programming (Nareyek et al. 2005). If such an encoding lacks the necessary planning knowledge, a program might learn effective solution methods. Our thesis is that it is possible to learn to solve constraint satisfaction problems from experience. In this scenario, a program is given a set of CSPs and a set of search heuristics. It is then expected to learn an effective search algorithm, represented as a weighted combination of some subset of those heuristics. From our perspective, the scenario’s principal challenge is a plethora of purportedly “good” heuristics: heuristics to select variables or values, heuristics for inference, heuristics to determine when to restart. The focus here is on heuristics for traditional global search. After fundamental definitions and related work, this paper addresses the differences among search order heuristics, the power of mixtures of heuristics, and fundamental issues in learning to solve a class of CSPs. It then describes two algorithms for learning such mixtures, and additional learning methods that speed learning and often improve search performance. Background and related work A CSP is a set of variables, each with a domain of values, and a set of constraints, expressed as relations over subsets of those variables. CSP papers often present results on a class of CSPs, that is, a set of putatively similar problems. For example, a class of model B problems is characterized by , where n is the number of variables, m the maximum domain size, d the density (fraction of edges out of n(n-1)/2 possible edges) and t the tightness (fraction of possible value pairs that each constraint excludes) (Gomes et al. 2004). A problem class can also mandate some non-random structure on its problems. For example, a composed problem consists of a subgraph called its central component loosely joined to one or more subgraphs called satellites (Aardal et al. 2003). In a binary CSP, all constraints are on at most two variables. A binary CSP can be represented as a constraint graph, where vertices correspond to the variables (labeled by their domains), and each edge represents a constraint between its respective variables. Although the work reported here is on binary CSPs, in principle that is not a restriction. A solution to a CSP is an instantiation of all its variables that satisfies all the constraints. Here, search for a solution iteratively selects a variable and assigns it a value from its domain, producing a search node. After each assignment, some form of inference detects values that are incompatible with the current instantiation. We use the MAC-3 inference algorithm to maintain arc consistency during search (Sabin et al. 1997). MAC-3 temporarily removes currently unsupportable values to calculate dynamic domains that reflect the current instantiation. If every value in any variable’s domain is inconsistent (violates some constraint), then the current instantiation cannot be extended to a solution and some retraction method is applied. Retraction here is chronological backtracking: it prunes the subtree (digression) rooted at the inconsistent node and withdraws the most recent value assignment(s). All data was generated with ACE (the Adaptive Constraint Engine). ACE learns a customized combination of pre-specified search heuristics for a class of CSPs (Epstein et al. 2005a) . It attempts to solve some sequence of these problems within a specified resource limit (the learning phase). Then learning is turned off and the program attempts to solve a new sequence of problems drawn from the same set (the testing phase). A run is a learning phase followed by a testing phase. The resource limit is measured in steps (number of variable selections and value selections). The ability of a program to learn to solve CSPs is gauged here by the number of problems solved, the number of search steps, and the search tree size in nodes, averaged over a set of runs. The premise of learning is that data can be calculated, stored, and applied to improve performance. Thus, it is reasonable to learn about how to solve a class of problems only if the effort expended, both to learn and to apply learned knowledge, can be justified by a frequent need to solve similar problems. For easy problems, learning is a waste of resources —the search algorithm should recognize and apply a simple, effective approach from its arsenal. On a class of more challenging problems, learning may be worthwhile if one expects to solve such problems often. Indeed, proponents of any new search algorithm inherently argue that most CSPs are enough like one another so that success on some set of classes, as described in their papers, bodes well for other classes yet untested. On a class of hard problems, learning is appealing because the given search algorithm is slower than one would wish, and thus the class is “hard” for the search algorithm at hand. Historically most learning for constraint solving has been on an individual problem, rather than on an entire class. Such learning has primarily focused either on inconsistent partial instantiations that should be avoided or on constraints that provoke retraction (Dechter et al. 1987; Dechter 2003; Boussemart et al. 2004). Other work has learned weights for individual assignments (Refalo 2004), alternated among methods while solving an individual problem (Borrett et al. 1996), identified problematic edges with a preliminary local search (Ruml 2001; Eisenberg et al. 2003; Hoos et al. 2004), learned global constraints (Bessière et al. 2001; Bessière 2007), or addressed optimization problems and incomplete methods (Caseau et al. 1999; Caseau et al. 2004; Carchrae et al. 2005). The argument for multiple heuristics Despite enthusiasm for them in the CSP literature, ordering heuristics (those that select variables and values for them) display surprisingly uneven performance. Consider, for example, the performance of the variable selection heuristics in Table 1. (Definitions for all heuristics appear in the Appendix.) Even well-trusted individual heuristics such as these vary dramatically in their performance. For example, max-weighted-degree (Boussemart et al. 2004) is among the best individual heuristics when the number of variables is substantially larger than the maximum domain size (e.g., 50-10). It appears to be less effective, however, when there are more potential values than variables (e.g., 20-30). Perhaps more surprising is that the opposite of a popular heuristic may be considerably more effective than the original. Let a metric be a function from a set of choices (variables or values) to the real numbers. A metric returns a score for each choice. An ordering heuristic is thus a preference for one extreme or the other of the scores returned by its metric. A dual for a heuristic reverses the import of its metric (e.g., max-domain is the dual of mindomain). Duals of popular heuristics may outperform them on real-world problems and on problems with non-random structure (Petrie et al. 2003; Lecoutre et al. 2004; Otten et al. 2006). For example, each composed problems in Comp has a Model B central component from <22, 6, 0.6, 0.1> linked to a single model B satellite from <8, 6, 0.72, 0.45> by edges with density 0.115 and tightness 0.05. The central component is substantially larger, with lower tightness and lower density than its satellite. These CSPs are particularly difficult for some traditional heuristics. For example, maxdegree tends to select variables from the central component, while the decidedly untraditional min-degree tends to prefer variables from the satellite and thereby detects inconsistencies much earlier. Table 2 shows how three traditional heuristics and their duals fare on Comp. Surprisingly, the simplest duals do by far the best. This is of particular concern because the structural features of Comp often appear in real-world problems. In practice, a good mixture of heuristics can outperform even the best individual one, as Table 3 demonstrates. The first line shows the best performance achieved by any traditional single heuristic from Table 1. The second line of Table 3 shows that a good pair of heuristics, one for variable ordering and the other for value ordering, can perform significantly better than an individual heuristic. Nonetheless, the identification of such a pair is not trivial. For example, max-produc t -domain -va lue better Table 2: Performance of 3 popular heuristics (in italics) and their duals on 50 Comp problems (described in the text) under a 100,000-step limit. Observe how much better the duals perform on problems from this class. Heuristic Unsolved problems Steps Max degree 9 19901.76 Min degree 0 64.60 Max forward-degree 4 10590.64 Min forward-degree 0 64.50 Min domain/degree 7 15558.28 Max domain/degree 4 10922.82 Table 1: Search tree size under individual heuristics on 50 problems from each of three randomly-generated Model B classes: <50, 10, 0.38, 0.2>, <20, 30, 0.444, 0.5>, and <30, 8, 0.26, 0.34> (referred to hereon as 50-10, 20-30, and 30-8, respectively). Heuristic 30-8 20-30 50-10 min-domain 563 10,411 51,347 max-degree 206 5,267 46,347 max-forward-degree 220 10,150 43,890 min-domain/degree 234 4,194 35,175 max-weighted-degree 223 5,897 30,956 min-dom/dynamic-deg 211 3,942 30,791 min-dom/weighted-deg 205 4,090 30,025 Table 3: Search tree size under individual heuristics and under mixtures of heuristics on three classes of problems. ACE learns a different, high-performing mixture of more than two heuristics for each of these classes. Mixture 30-8 20-30 50-10 The best heuristic from Table 1 205 3,942 30,025 Min dom/dynamic degree + Max Product Domain Value 156 2,764 15,091 Max-weighted-degree + Max Product Domain Value 179 3,892 22,273 Mixture found by ACE 141 2,502 12,120 complements min-domain/dynamic-degree than it does max-weighted-degree. The last line demonstrates that combinations of more than two heuristics can further improve performance. Given these results, a program required to learn effective search without knowledge about problem structure should be provided with many popular heuristics, along with their duals. ACE’s heuristics, each with its own metric, are gleaned from the CSP literature. To make a decision during search, ACE uses a weighted mixture of expressions of preference from a large number of such heuristics. This is a difficult task. Why learning on a class of problems is hard Without an instructor to provide examples of good and bad decisions, learning in our scenario is self-supervised, that is, the learner must assess both the quality of its own actions and the adequacy of its model of the environment. The pairs from a solver’s trace provide self-generated training instances. Reinforcement learning rewards or penalizes heuristics based on their ability to provide good search advice (Sutton & Barto, 1998), but in this context it faces a variety of difficulties. A solution path may not provide good training instances. Since every variable must be assigned a value, any variable ordering must eventually lead to a solution if the problem is solvable. Nonetheless, some variable orders generate substantially fewer nodes, and may be more effective by several orders of magnitude; those are the ones we want our learner to produce. Self-generated training instances, however, may not necessarily represent good variable choices. Moreover, any variable ordering can lead to an error-free solution if each chosen value satisfies all constraints. As a result, the ease with which a solution is found is not a reliable criterion for evaluating the quality of the decisions that led to a solution. The difficulty of a problem is hard to assess. Training instances must be drawn from the same population as testing instances, but a class of CSPs is only putatively similar. For a given search algorithm, in some circumstances the distribution of difficulty within a class is heavy tailed (Hulubei et al. 2005). Thus some problems will be extremely difficult, while others will be manageable, or even easy. When a learner confronts a CSP from a class, it is hard to predict how amenable the particular problem will be to the search algorithm. This issue arises whether or not the problems are “hard” in some fundamental sense. Variation in difficulty is not noise; it is inherent in the problems themselves and in their interaction with heuristics. In learning to solve CSPs, the skewed distribution within a problem class (as the result, perhaps, of an inappropriate heuristic) poses a particular challenge that is only exacerbated by more difficult classes. The difficulty of a problem class is hard to assess. In Model B problems, for fixed values of n and m, there are value combinations for d and t that make the entire class of problems difficult in some fundamental sense (the phase transition) (Cheeseman et al. 1991). Even in a class at the phase transition (as are the classes in Table 3) there may be a wide range of difficulty, so that individual problems could give a misleading picture of the class as a whole. In theory one could assess the difficulty of a class using standard algorithms on a sample drawn from it, and thereby characterize the relative difficulty for problems with different parameter values. More generally, however, particularly in real-world contexts, this may not be possible to do beforehand. In such situations, the previously described difficulties of learning from learner-generated solution paths may be magnified. The severity of an error is costly to assess. An error is a value assignment that is eventually retracted during search. Typically, even a handcrafted CSP solver arrives at a solution only after a lengthy series of errors. To penalize incorrect decisions appropriately, one should assess the severity of the error. Effectively, any incorrect decision creates an unsolvable problem. When a good solver errs, it will quickly discover its error. Gauging the effectiveness of error recovery, however, requires exploration of every possible ordering of value assignments in the digression, an unreasonable computational burden. Errors may not be immediately apparent. An important issue in credit/blame assignment for reinforcement learning is that most retractions appear at some distance from the root of the search tree. In fact, even for hard but solvable problems, there are usually relatively few retractions at the top of the search tree, even with maintained arc consistency. Retractions often begin only after several decisions have been made. In such searches, the impact of bad decisions, especially variable selections, appears only after several more decisions have been made. As a result, it is difficult to assign blame to the true culprits. Implications for learning. In summary, given a set of search traces, it is difficult to gauge how representative they are of effective search, difficult to identify sources of inefficiency from errors alone, and difficult to gauge how severe the errors are, how hard an individual problem is (despite its class designation), and even the degree to which a solution is based on good decisions. Moreover, since CSP solution is NP-complete, there can be no “gold standard” by which to judge the quality of a heuristic; the perfect search path must be assumed to be unobtainable on a regular basis. Clearly, for a program expected to learn an effective search algorithm based only on its own problemsolving experience, the interpretation of success and failure is not straightforward. Even the worst heuristic can solve some problems quickly. If such problems occur early in learning, then an ineffective heuristic will deceptively appear to be effective. If poor heuristics are reinforced early in learning, they will inevitably lead to poor performance on some subsequent problems. Learning a mixture of heuristics ACE, is based on FORR, an architecture for the development of expertise from multiple heuristics (Epstein 1994). ACE learns a customized weighted mixture of prespecified heuristics for any given class. Guided by its ordering heuristics (here, Advisors) ACE solves problems in a given class and uses that experience to learn a weight profile (a set of weights for the Advisors). To select a variable or a value, ACE consults its Advisors. Each Advisor Ai expresses the strength sij of its preference for choice cj. Based on the weight profile, the choice with the highest weighted sum of Advisors strengths to choices is selected:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Adaptation to Solve Constraint Satisfaction Problems

Constraint-based problems are hard combinatorial problems and are usually solved by heuristic search methods. In this paper, we consider applying a machine learning approach to improve the performance of these search-based solvers. We apply reinforcement learning in the context of Constraint Satisfaction Problems (CSP) to learn a value function, which results in a novel solving strategy. The mo...

متن کامل

Solving a New Multi-objective Inventory-Routing Problem by a Non-dominated Sorting Genetic Algorithm

This paper considers a multi-period, multi-product inventory-routing problem in a two-level supply chain consisting of a distributor and a set of customers. This problem is modeled with the aim of minimizing bi-objectives, namely the total system cost (including startup, distribution and maintenance costs) and risk-based transportation. Products are delivered to customers by some heterogeneous ...

متن کامل

Full Restart Speeds Learning

Because many real-world problems can be represented and solved as constraint satisfaction problems, the development of effective, efficient constraint solvers is important. A solver's success depends greatly upon the heuristics chosen to guide the process; some heuristics perform well on one class of problems, but are less successful on another. ACE is a constraint solver that learns to customi...

متن کامل

Predicting the runtime of combinatorial problems CS6780: Machine Learning Project Final Report

Solving combinatorial problems often requires expert knowledge. These problems are prevalent in many aspects of modern life and being able to model and solve these problems quickly is important. Many different modalities exist to solve these problems; such as Mixed Integer Programming, Constraint Programming and Satisfiability. These approaches have greatly varying performance. Choosing a solut...

متن کامل

A Mathematical Model and Grouping Imperialist Competitive Algorithm for Integrated Quay Crane and Yard Truck Scheduling Problem with Non-crossing Constraint

In this research, an integrated approach is presented to simultaneously solve quay crane scheduling and yard truck scheduling problems. A mathematical model was proposed considering the main real-world assumptions such as quay crane non-crossing, precedence constraints and variable berthing times for vessels with the aim of minimizing vessels completion time. Based on the numerical results, thi...

متن کامل

Using machine learning to make constraint solver implementation decisions

Programs to solve so-called constraint problems are complex pieces of software which require many design decisions to be made more or less arbitrarily by the implementer. These decisions affect the performance of the finished solver significantly [13]. Once a design decision has been made, it cannot easily be reversed, although a different decision may be more appropriate for a particular probl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007